Efficient Computation of Partial-Support for Mining Interesting Itemsets
نویسندگان
چکیده
Mining interesting itemsets is a popular topic in the data mining community. The objective of this problem is to mine all interesting itemsets, with respect to a given interestingness measure. While considerable efforts have being spent on justifying the various interestingness measures, the algorithms that mine them are not quite well-studied, except in the case support, which has resulted in the famous frequent itemset mining (FIM) problem. In this paper, we show that a certain class of interesting itemsets can be represented by functions of their partial support. This class includes some definitions of fault-tolerant itemsets, estimated support of itemsets in noisy data, and bond of itemsets. As the name implies, partial support of an itemset is the number of transactions containing some part of the given itemset. This paper addresses the problem of efficiently calculating partial supports, which leads to efficient algorithms for mining interesting itemsets in that class. We show that there exists a recurrence relation between partial supports. Hence, we can calculate the partial supports of itemset by simply extending any FIM algorithm (even the implementation). This allows us to benefit from innovations and optimizations in FIM algorithms. Theoretical analysis shows that our approaches retain the running time complexity of the base FIM algorithms for only a small cost in space. Extensive experiments on several real-world datasets also demonstrate that algorithms based on our approach are significantly faster than previously proposed techniques for corresponding definitions.
منابع مشابه
Efficient Incremental Mining of Top-K Frequent Closed Itemsets
In this work we study the mining of top-K frequent closed itemsets, a recently proposed variant of the classical problem of mining frequent closed itemsets where the support threshold is chosen as the maximum value sufficient to guarantee that the itemsets returned in output be at least K. We discuss the effectiveness of parameter K in controlling the output size and develop an efficient algori...
متن کاملDepth-First Non-Derivable Itemset Mining
Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collecti...
متن کاملSA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases
The issue of maintaining privacy in frequent itemset mining has attracted considerable attentions. In most of those works, only distorted data are available which may bring a lot of issues in the datamining process. Especially, in the dynamic update distorted database environment, it is nontrivial to mine frequent itemsets incrementally due to the high counting overhead to recompute support cou...
متن کاملFast Algorithms for Mining Interesting Frequent Itemsets without Minimum Support
Real world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not appropriate. Since without any domain knowledge, setting support threshold small or large can output nothing or a large number of redundant uninteresting res...
متن کاملFast Vertical Mining Using Boolean Algebra
The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scanning database many times like Apriori algorithm. In vertical mining, frequent itemsets can be represented as a set of bit vectors in memory, which enables for fast computation. The...
متن کامل